-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

SGML
(Standard Generalized Markup Language)

SGML is a metalanguage suitable for describing all kinds of markup languages, including HTML. It is also an International Standards Organization standard (ISO 8879) for specifying, defining, and creating documents that are independent of platform and display differences that are irrelevant to the delivery and rendering of those documents' contents. In other words, SGML is a language for formally defining documents types (or classes or kinds of documents) and document instances (where a kind of document is implemented with its own unique content:.) For instance, a Creole cookbook and an Italian cookbook are both obviously cookbooks (type), but each contains vastly different recipes (instances).

URLs:

SGML resources: A hotlist to numerous SGML resources, including most of the others mentioned here.
SGML intro: Another gem from the Web Weaver's Warren, this is a set of introductory and explanatory resources on SGML, including a terrific introduction.
World Tour of SGML: A pointer to a Web page describing SoftQuad's "World Tour of SGML" CD-ROM, a comprehensive SGML reference (that you can order on-line, if you choose). SoftQuad is a leading vendor of SGML and HTML authoring tools, and the "World Tour" is itself a pretty fine tool.
SGML resources: An overview of SGML and related resources, including pointers to a number of highly readable introduction and overview documents on the subject.
SGML vendors: SGML Open is a vendor consortium of companies involved in delivering or supporting SGML technology; and excellent resource for information on consultants, products, training, and other relevant materials.
SGML - Robin Cover: This is Robin Cover's home page on SGML, widely regarded as the best place to start searching for further information on SGML. Cover has done an excellent job of bringing all the best SGML resources together in one place.

Print Resources

Practical SGML, Second Edition, Eric van Herwijnen, Kluwer Academic Publishers, Boston, 1994 (ISBN 0-7923-9434-8).
Readme.1st: SGML for Writers and Editors, Ronald C. Turner et. al., Prentice Hall, Upper Saddle River, NJ, 1996 (ISBN 0-13-432717-9).
The SGML Handbook, Charles F. Goldfarb, Oxford University Press, Midsomer Norton, Avon, UK, 1993 (ISBN 0-19-853737-9).

W3E References:

Document Type Definition (DTD)
HTML
markup

Detail:

SGML is an international standard used for the formal definition of device-, system-, and application-independent electronic text. In other words, SGML is a metalanguage--a language used to describe other languages--that formally defines a descriptive markup language. A descriptive markup language is one that uses explicit markup, called tags, to describe what a document's structure and function are, rather than forcing that document to behave in certain ways (as is the case with most word processors, layout programs, and other software used to create documents for everyday use).

The power of SGML comes from its structure-driven approach to describing the contents of a document. In fact, SGML identifies each part of a document by its purpose and role. SGML does not describe a document's appearance; it leaves presentation of documents to browsers and print-formatting applications.

SGML originated with work begun at IBM in the 1960s to overcome the problems inherent in moving documents among multiple hardware platforms and operating systems. IBM's efforts were called GML, for General Markup Language. GML was originally targeted for internal use at IBM rather than as a generic way of representing documents. This was the first publish-once, multiplatform strategy for document preparation--a strategy that's become essential for managing large, complex publication environments today.

GML's originators--Charles Goldfarb, Ed Mosher, and Ray Lorie (the original "GML")--realized by the early 1970s that a more general form of markup would make documents portable from any one system to any other. The work led in the 1980s to the definition and birth of SGML , which is governed today by the ISO8879 standard, and used around the world.

SGML is a powerful and complex tool for representing documents of all kinds. It offers the ability to create specifications for many types of documents, that can then be used to define and build individual document instances conforming to those specifications.

A large variety of government agencies, vendor consortia, and industry organizations have adopted SGML. The Department of Defense (DoD), for instance, mandates that all documentation be submitted in a format that complies with the CALS standard MIL-M-28001B. CALS (Continuous Acquisition and Life-cycle Support) is a DoD initiative to promote electronic document interchange between itself and its many contractors and subcontractors. MIL-M-28001B specifies formal Document Type Definitions (DTDs) for technical manuals in the required format. This lets Yoyodyne, Inc. develop its documentation on a Linux system using troff, but guarantees that the procurement clerk in the Pentagon who's running a Sun workstation using an ArborText system will be able to read, print, modify, and abstract its contents without going through a lot of contortions.

The guiding principal behind SGML is based on a concept called "markup"--namely, that text and word processing systems typically require additional information to be included with the natural text that makes up the content of a document. This added information serves two basic functions:

to separate logical elements within a document
to specify the various types of processing functions to be applied to those elements (e.g. bolding, italicizing, font changes, etc.).

The use of "generalized markup" is what makes SGML's document definition capabilities so all-encompassing and powerful. A quote from Charles Goldfarb's classic tome on the subject, The SGML Handbook, explains this concept wonderfully (pp 7-8):

generalized markup" ...does not restrict documents to a single applications formatting style, or processing system. Generalized markup is based on two novel postulates:

Markup should describe a document's structure and other attributes rather than specify processing to be performed on it, as descriptive markup need be done only once and will suffice for all future processing.
Markup should be rigorous so that the techniques available for processing rigorously-defined objects such as programs and databases can be used for processing documents as well.

The key ideas here are that markup needs to be applied only once, and can drive multiple forms of output ("all future processing"), and that markup be rigorous enough to support computerized parsing, data manipulation, and programmatic transformations or output. SGML deliver these capabilities in spades.

The heart of an SGML-based document is its governing Document Type Definition, or DTD. A DTD lays out the structural elements and markup definitions for a document, which can then be used to create actual document instances (which supply content that's organized within the framework of the DTD). This is why SGML is often described as a form of "descriptive markup"-- meaning that it describes the elements in and organization of a document thoroughly, without necessarily addressing how such elements are presented. Most typical word processors use "procedural markup" which intermixes presentation, structure, and organization along with content.

The inability to separate presentation description from structure and organization is what makes word processor files so dependent on their particular applications--without the program that "understands" these peculiar formats, such files are largely unintelligible. SGML documents, on the other hand, can be read and "understood" by any system that can handle general SGML parsing, so long as the document's governing DTD is available to provide definitions for the document's structural and organizational elements. This also means that specific "output DTDs" can be created, to permit the same document to be presented in a variety of ways, tailored for specific media. Output DTDs let the same document instance be rendered differently (and customized completely) for hard copy output, CD-ROM delivery, and presentation via the World-Wide Web, among other uses. In a very small nutshell, this capability explains why so many organizations are moving to adopt SGML as the document description technology driving their publication processes.

Here's a short list of some of the agencies, consortia, and organizations that have produced standard DTDs:

the US Department of Defense (CALS and the MIL-M-28001B specification)
the American Association of Publishers developed the National Standard for Electronic Manuscript Preparation and Markup, a general purpose book-oriented DTD for publication use.
the Air Transport Association (ATA), a consortium of commercial airlines and related companies, has developed multiple DTDs based on the ATA-100 specification (the European airline association, AECMA, has similar efforts underway).
the Davenport Group is a discussion forum composed of software developer and software-oriented publishers, that has developed the DocBook DTD specifically for computer software manuals and programmer's references.
the Pinnacles Initiative seeks to define an information interchange standard that will enable electronic components manufacturers to create Electronic Data Books that include all of the information necessary to facilitate the design and support of a electronic or mechanical component.
the Society of Automotive Engineers (SAE) has defined an SAE J2008 DTD for electronic interchange of diagnostic and repair information.
the Telecommunications Industry Forum (TCIF) is an international association of carriers and major telecom vendors, whose initiative seeks to support the re-use of technical information across multiple applications and processing environments.

Because SGML fosters a platform- and software-independent means of defining and exchanging documentation, it has become a representational tool of choice whenever multiple partners must exchange documents (especially large, complex ones). But because SGML also supports rigorous document definitions and descriptions, it is becoming a preferred tool for purely in-house publishing needs. Not only does SGML support computerized validation (to make sure a document conforms entirely to its governing DTD), it also permits the same document to be used to create a variety of forms of output. Since it's so much easier to maintain only one version of information than to try to synchronize multiple (and incompatible) versions, the move toward SGML for corporate publishing environments is gaining considerable momentum.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

E-Mail: The World Wide Web Encyclopedia at wwwe@tab.com
E-Mail: Charles River Media at chrivmedia@aol.com
Copyright 1996 Charles River Media. All rights reserved.
Text - Copyright © 1995, 1996 - James Michael Stewart & Ed Tittel.
Web Layout - Copyright © 1995, 1996 - LANWrights &IMPACT Online.
Revised -- February 20th, 1996 [James Michael Stewart - WebMaster - IMPACT Online]

SGML (Standard Generalized Markup Language)

URLs:

W3E References:

Detail:

SGML
(Standard Generalized Markup Language)